Machine Learning Bias, Statistical Bias, and Statistical Variance of Decision Tree Algorithms
نویسندگان
چکیده
The term \bias" is widely used|and with diierent meanings|in the elds of machine learning and statistics. This paper clariies the uses of this term and shows how to measure and visualize the statistical bias and variance of learning algorithms. Statistical bias and variance can be applied to diagnose problems with machine learning bias, and the paper shows four examples of this. Finally, the paper discusses methods of reducing bias and variance. Methods based on voting can reduce variance, and the paper compares Breiman's bagging method and our own tree randomization method for voting decision trees. Both methods uniformly improve performance on data sets from the Irvine repository. Tree randomization yields perfect performance on the Letter Recognition task. A weighted nearest neighbor algorithm based on the innnite bootstrap is also introduced. In general, decision tree algorithms have moderate-to-high variance, so an important implication of this work is that variance|rather than appropriate or inappropriate machine learning bias|is an important cause of poor performance for decision tree algorithms.
منابع مشابه
بررسی کارایی مدل درختان تصمیمگیری در برآورد رسوبات معلق رودخانهای (مطالعه موردی: حوضه سد ایلام)
The real estimation of the volume of sediments carried by rivers in water projects is very important. In fact, achieving the most important ways to calculate sediment discharge has been considered as the objective of the most research projects. Among these methods, the machine learning methods such as decision trees model (that are based on the principles of learning) can be presented. Decision...
متن کاملStatistical Sources of Variable Selection Bias in Classification Tree Algorithms Based on the Gini Index
Evidence for variable selection bias in classification tree algorithms based on the Gini Index is reviewed from the literature and embedded into a broader explanatory scheme: Variable selection bias in classification tree algorithms based on the Gini Index can be caused not only by the statistical effect of multiple comparisons, but also by an increasing estimation bias and variance of the spli...
متن کاملAppendix : Machine Learning Bias Versus Statistical Bias
is if and 0 if. This high variance may help to explain why there is selection pressure for weak (machine learning) bias when the (machine learning) bias correctness is low. The reason that statisticians are interested in (statistical) bias and variance is that squared error is equal to the sum of squared (statistical) bias and variance. Therefore minimal (statistical) bias and minimal variance ...
متن کاملAppendix : Machine Learning Bias Versus Statistical Bias
is if and 0 if. This high variance may help to explain why there is selection pressure for weak (machine learning) bias when the (machine learning) bias correctness is low. The reason that statisticians are interested in (statistical) bias and variance is that squared error is equal to the sum of squared (statistical) bias and variance. Therefore minimal (statistical) bias and minimal variance ...
متن کاملAppendix : Machine Learning Bias Versus Statistical Bias
is if and 0 if. This high variance may help to explain why there is selection pressure for weak (machine learning) bias when the (machine learning) bias correctness is low. The reason that statisticians are interested in (statistical) bias and variance is that squared error is equal to the sum of squared (statistical) bias and variance. Therefore minimal (statistical) bias and minimal variance ...
متن کامل